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FIELD OF THE INVENTION 
The current invention relates to the processing of images such as photographs, 
drawings, and other two dimensional displays. It further relates to the processing of such images 
which are captured in digital format or after they have been converted to or expressed in digital 
format. This invention further relates to use of novel coding methods to increase the speed and 
compression ratio for digital image storage and transmission while avoiding introduction of 
undesirable artifacts into the reconstructed images. 

BACKGROUND OF THE INVENTION 

In general, image processing is the analysis and manipulation of two-dimensional 
representations, which can comprise photographs, drawings, paintings, blueprints, x-rays of 
medical patients, or indeed abstract art or artistic patterns. These images are all two-dimensional 
arrays of information. Until fairly recently, images have comprised almost exclusively analog 
displays of analog information, for example, conventional photographs and motion pictures. Even 
the signals encoding television pictures, notwithstanding that the vertical scan comprises a finite 
number of lines, are fundamentally analog in nature. 

Beginning in the early 1960's, images began to be captured or converted and 
stored as two-dimensional digital data, and digital image processing followed. At first, images 
were recorded or transmitted in analog form and then converted to digital representation for 
manipulation on a computer. Currently digital capture and transmission are on their way to 
dominance, in part because of the advent of charge coupled device (CCD) image recording arrays 
and in part because of the availability of inexpensive high speed computers to store and 
manipulate images. 

An important task of image processing is the correction or enhancement of a 
particular image. For example, digital enhancement of images of celestial objects taken by space 
probes has provided substantial scientific information. However, the current invention relates 
primarily to compression for transmission or storage of digital images and not to enhancement. 

One of the problems with digital images is that a complete single image fi^ame can 
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require up to several megabytes of storage space or transmission bandwidth. That is, one of 
today's 3-1/2 inch floppy discs can hold at best a little more than one gray-scale frame and 
sometimes substantially less than one whole frame. A full-page color picture, for example, 
uncompressed, can occupy 30 megabytes of storage space. Storing or transmitting the vast 
amounts of data which would be required for real-time uncompressed high resolution digital video 
is technologically daunting and virtually impossible for many important communication channels, 
such as the telephone line. The transmission of digital images from space probes can take many 
hours or even days if insufficiently compressed images are involved. Accordingly, there has been 
a decades long effort to develop methods of extracting from images the information essential to 
an aesthetically pleasing or scientifically useful picture without degrading the image quality too 
much and especially without introducing unsightly or confusing artifacts into the image. 



coupled with quantization. One approach is block coding; another approach, mathematically 
equivalent with proper phasing, is multiphase filter banks. Frequency based multi-band transforms 
have long found application in image coding. For instance, the JPEG image compression 
standard, W. B. Pennebaker and J. L. Mitchell, "JPEG: Still Image Compression Standard," 
Van Nostrand Reinhold, 1993, employs the 8 x 8 discrete cosine transform (DCT) at its 
transformation stage. At high bit rates, JPEG offers almost lossless reconstructed image quality. 
However, when more compression is needed, annoying blocking artifacts appear since the DCT 
bases are short and do not overlap, creating discontinuities at block boundaries. 



overlapping bases, has elegantly solved the blocking problem. However, the transform's 
computational complexity can be significantly higher than that of the DCT. This complexity gap 
is partly in terms of the number of arithmetical operations involved, but more importantly, in 
terms of the memory buffer space required. In particular, some implementations of the wavelet 
transform require many more operations per output coefficient as well as a large buffer. 

An interesting ahemative to wavelets is the lapped transform, e.g., H. S. Malvar, 
Signal Processing with Lapped Transforms, Artech House, 1992, where pixels from adjacent 



The basic approach has usually involved some form of coding of picture intensities 



The wavelet transform, on the other hand, with long, varying-length, and 
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blocks are utilized in the calculation of transform coefficients for the working block. The lapped 
transforms outperform the DCT on two counts: (i) fi*om the analysis viewpoint, they take into 
account inter-block correlation and hence provide better energy compaction; (ii) from the 
synthesis viewpoint, their overlapping basis functions decay asymptotically to zero at the ends, 
reducing blocking discontinuities dramatically. 

Nevertheless, lapped transforms have not yet been able to supplant the unadorned 
DCT in international standard coding routines. The principal reason is that the modest 
improvement in coding performance available up to now has not been sufficient to justify the 
significant increase in computational complexity. In the prior art, therefore, lapped transforms 
remained too computationally complex for the benefits they provided. In particular, the previous 
lapped transformed somewhat reduced but did not eliminate the annoying blocking artifacts. 

It is therefore an object of the current invention to provide a new transform which 
is simple and fast enough to replace the bare DCT in international standards, in particular in JPEG 
and MPEG-like coding standards. It is another object of this invention to provide an image 
transform which has overlapping basis fianctions so as to avoid blocking artifacts. It is a further 
object of this invention to provide a lapped transform which is approximately as fast as, but more 
eflBcient for compression than, the bare DCT. It is yet another object of this invention to provide 
dramatically improved speed and efficiency using a lapped transform with lifting steps in a 
butterfly structure with dyadic-rational coefficients. It is yet a further object of this invention to 
provide a transform structure such that for a negligible complexity surplus over the bare DCT a 
dramatic coding performance gain can be obtained both from a subjective and objective point of 
view while blocking artifacts are completely eliminated. 

SUMMARY OF THE INVENTION 
In the current invention, we use a family of lapped biorthogonal transforms 
implementing a small number of dyadic-rational lifting steps. The resulting transform, called the 
LiftLT, not only has high computation speed but is well-suited to implementation via VLSI. 
Moreover, it also consistently outperforms state-of-the-art wavelet based coding systems in 

4- 
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coding performance when the same quantizer and entropy coder are used. The LiftLT is a lapped 
biorthogonal transform using lifting steps in a modular lattice structure, the result of which is a 
fast, efficient, and robust encoding system. With only 1 more multiplication (which can also be 
implemented with shift-and-add operations), 22 more additions, and 4 more delay elements 
compared to the bare DCT, the LiftLT offers a fast, low-cost approach capable of straightforward 
VLSI implementation while providing reconstructed images which are high in quality, both 
objectively and subjectively. Despite its simplicity, the LiftLT provides a significant improvement 
in reconstructed image quality over the traditional DCT in that blocking is completely eliminated 
while at medium and high compression ratios ringing artifacts are reasonably contained. The 
performance of the LiftLT surpasses even that of the well-known 9/7-tap biorthogonal wavelet 
transform with irrational coefficients. The LiftLT's block-based structure also provides several 
other advantages: supporting parallel processing mode, facilitating region-of-interest coding and 
decoding, and processing large images under severe memory constraints. 

Most generally, the current invention is an apparatus for block coding of windows 
of digitally represented images comprising a chain of lattices of lapped transforms v^th dyadic 
rational lifting steps. More particularly, tliis invention is a system of electronic devices which 
codes, stores or transmits, and decodes M x M sized blocks of digitally represented images, where 
M is a - powcr of 2 : The main block transform structure comprises a transform having M channels 
numbered 0 through M-1, half of said channel numbers being odd and half being even; a 
normalizer vAth a dyadic rational normalization factor in each of said M channels; two lifting steps 
v^th a first set of identical dyadic rational coefficients connecting each pair of adjacent numbered 
channels in a butterfly configuration; M/2 delay lines in the odd numbered channels; two inverse 
lifting steps with the first set of dyadic rational coefficients connecting each pair of adjacent 
numbered channels in a butterfly configuration; and two lifting steps with a second set of identical 
dyadic rational coefficients connecting each pair of adjacent odd numbered channels; means for 
transmission or storage of the transform output coefficients; and an inverse transform comprising 
M channels numbered 0 through M-1, half of said channel numbers being odd and half being even; 
two inverse lifting steps with dyadic rational coefficients connecting each pair of adjacent odd 
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numbered channels; two lifting steps with dyadic rational coefficients connecting each pair of 
adjacent numbered channels in a butterfly configuration; Mil delay lines in the even numbered 
channels; two inverse lifting steps with dyadic rational coefficients connecting each pair of 
adjacent numbered channels in a butterfly configuration; a denormalizer with a dyadic rational 
inverse normalization factor in each of said M channels; and a base inverse transform having M 
channels numbered 0 through M-L 



Figure 1 is a polyphase representation of a linear phase perfect reconstruction filter bank. 

Figure 2 shows the most general lattice structure for linear phase lapped transforms with filter 
length L = /a/. 

Figure 3 shows the parameterization of an invertible matrix via the singular value decomposition. 

Figure 4 portrays the basic butterfly lifting configuration. ' 

Figure 5 depicts the analysis LiftLT lattice drawn for M = 8, 

Figure 6 depicts the synthesis LiftLT lattice drawn for A/ = 8. 

Figure 7 depicts a VLSI implementation of the analysis filter bank operations. 

Figure 8 shows fi'equency and time responses of the 8x16 LiftLT: Left: analysis bank. Right: 
synthesis bank. 

Figure 9 portrays reconstructed "Barbara" images at 1:32 compression ratio. 



BRIEF DESCRIPTION OF THE DRAWINGS 
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DESCRIPTION OF THE PREFERRED EMBODIMENT 
Typically, a block transform for image processing is applied to a block (or 
window) of, for example, 8x8 group of pixels and the process is iterated over the entire image. 
A biorthogonal transform in a block coder uses as a decomposition basis a complete set of basis 
vectors, similar to an orthogonal basis. However, the basis vectors are more general in that they 
may not be orthogonal to all other basis vectors. The restriction is that there is a "dual" basis to 
the original biorthogonal basis such that every vector in the original basis has a "dual" vector in 
the dual basis to which it is orthogonal. The basic idea of combining the concepts of 
biorthogonality and lapped transforms has already appeared in the prior art. The most general 
lattice for M-channel linear phase lapped biorthogonal transforms is presented in T. D. Tran, R. 
de Queiroz, and T. Q. Nguyen, "The generalized lapped biorthogonal transform," ICASSP, pp. 
1441-1444^ Seattle, May 1998, and in T. D. Tran, R. L. de Queiroz, and T. Q. Nguyen, "Linear 
phase perfect reconstruction filter bank: lattice structure, design, and application in image coding" 
(submitted to IEEE Trans, on Signal Processing, Apr. 1998). A signal processing flow diagram 
of this well-known generalized filter bank is shown in Fig. 2. 

In the current invention, which we call the Fast LiftLT, we apply lapped transforms 
based on using fast lifting steps in an M-channel uniform linear-phase perfect reconstruction filter 
bank, according to the generic polyphase representation of Figure 1. In the lapped biorthogonal 
approach, the polyphase matrix E(z) can be factorized as 

E(z) = Gk.,(z)Gk_2(z)--G,(z)E,(z), where (1) 
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Eo(z) = -^ 

In these equations, I is the identity matrix^ aiJ r -j^*-*^*^ V^t ^uA:iij/MJ, 

The transform decomposition expressed by equations (1) through (3) is readily 
represented, as shown in Figure 2, as a complete lattice replacing the "analysis" filter bank E(z) of 
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Figurejl . This decomposition results in a lattice of filters having length L = KM, (K is often called 
the overlapping factor.) Each cascading structure g,(z) increases the filter length by M All u, 
and V, , i = 0, 1, • • K - 1 , are arbitrary M/2 x M/2 invertible matrices. According to a theorem well 
known in the art, invertible matrices can be completely represented by their singular value 
decomposition (SVD), given by 

where Ujq ,Uii,VjQ, Vjj are diagonalizing orthogonal matrices and ^ . ,a . are diagonal matrices 
with positive elements. 

It is well known that any M/2 xh4/2 orthogonal matrix can be factorized into 
M(M-2)/8 plane rotations 6^ and that the diagonal matrices represent simply scaling factors . 
Accordingly, the most general LT lattice consists of KM(M-2)/2 two dimensional rotations and 
2A/ diagonal scaling factors . The-o fthogongri matrix as a sequence of pairwise plane rotations 

0i as shown in Fig. 3. 



It is also well known that a plane rotation can be performed by 3 "shears": 
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The signal processing flow diagram of this operation is shown in Fig. 4. The crossing 
arrangement of these flow paths is also referred to as a butterfly configuration. Each of the above 
"shears" can be written as a lifting step. 

Combining the foregoing, the shears referred to can be expressed as 
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computationally equivalent "lifting steps" in signal processing. In other words, we can replace 
each "rotation" by 3 closely-related lifting steps with butterfly structure. It is possible therefi)re to 
implement the complete LT lattice shown in Figure 2 by 3KM(M-2)/2 lifting steps and 2M scaling 
multipliers. 

5 In the simplest but currently preferred embodiment, to minimize the complexity of 

the transform we choose a small overlapping factor K=2 and set the initial stage Eo to be the DCT 
itself Many other coding transforms can serve for the base stage instead of the DCT, and it 
should be recognized that many other embodiments are possible and can be implemented by one 
skilled in the art of signal processing. 
niO Following the observation in H. S. Malvar, "Lapped biorthogonal transforms for 

nj transform coding with reduced blocking and ringing artifacts," ICASSP97, Munich, April 1997, 

n ^ 

; ^ we apply a scaling factor to the first DCT's antisymmetric basis to generate synthesis LT basis 

□ fiinctions whose end values decay smoothly to exact zero - a crucial advantage in blocking 

u artifacts elimination. However, instead of scaling the analysis by and the synthesis by ^ , we 

f\ I 

|=d5 opt for 25/16 and its inverse 16/25 since they allow the implementation of both analysis and 

"7:^^^ synthesis banks in integer arithmetic. Another value that works almost as well as 25/16 is 5/4. To 

.•SJ.K, 

sunrmiarize, the following choices are made in the first stage: the combination of Uoo and 
\ ^ 0 with the previous butterfly form the DCT;^, = t//a^[f 1], and = U^o = V^o = Im/2 . See 
Fig. 2. ^ 
20 After 2 series of ± 1 butterflies W and the delay chain \(z) , the LT symmetric 

basis fiinctions already have good attenuation, especially at DC ( 6> = 0 ). Hence, we can 
comfortably set Uj = l^^^ 

As noted, V, is factorizable into a series of lifting steps and diagonal scalings. 
However, there are several problems: (i) the large number of lifting steps is costly in both speed 
25 and physical real-estate in VLSI implementation; (ii) the lifting steps are related; (iii) and it is not 

immediately obvious what choices of rotation angles will resuh in dyadic rational lifting 
multipliers. In the current invention, we approximate V, by (M/2) - 1 combinations of block- 
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diagonal predict-and-update lifting steps, i.e., 
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Here, the free parameters and can be chosen arbitrarily and independently 
without affecting perfect reconstruction. The inverses are trivially obtained by switching the 
order and the sign of the lifting steps. Unlike popular lifting implementations of various wavelets, 
all of our lifting steps are of zero-order, namely operating in the same time epoch. In other 
words, we simply use a series of 2x2 upper or lower diagonal matrices to parameterize the 
invertible matrix V, . 

Most importantly, fast-computable VLSI-fiiendly transforms are readily available 
when u. and are restricted to dyadic rational values, that is, rational fractions having 

(preferably small) powers of 2 denominators. With such coefficients, transform operations can for 
the most part be reduced to a small number of shifts and adds. In particular, setting all of the 
approximating lifting step coefficients to -J/2 yields a very fast and elegant lapped transform. 
With this choice, each lifting step can be implemented using only one simple bit shift and one 
addition. 

The resulting LiftLT lattice structures are presented in Figures 5 and 6. The 
analysis filter shown in Fig. 5 comprises a DCT block 1, 25/16 normalization 2, a delay line 3 on 
four of the eight channels, a butterfly structured set of lifting steps 5, and a set of four fast dyadic 
lifting steps 6. The frequency and impulse responses of the 8x16 LiftLT*s basis ftinctions are 
depicted in Figure 8, 

The inverse or synthesis lattice is shown in Fig. 6. This system comprises a set of 
four fast dyadic lifting steps 11, a butterfly-structured set of lifting steps 12, a delay line 13 on 
four of the eight channels, 16/25 inverse normalization 14, and an inverse DCT block 15. Fig. 7 
also shows the frequency and impulse responses of the synthesis lattice. 

The LiftLT is sufficiently fast for many applications, especially in hardware, since 
most of the incrementally added computation comes from the 2 butterflies and the 6 shift-and-add 
lifting steps. It is faster than the type-I fast LOT described in H. S. MalvdiT, Signal Processing 
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w/Y/? Lapped Transforms^ Artech House, 1992. Besides its low complexity, the LiflLT possesses 
many characteristics of a high-performance transform in image compression: (i) it has high energy 
compaction due to a high coding gain and a low attenuation near DC where most of the image 
energy is concentrated; (ii) its synthesis basis functions also decay smoothly to zero, resulting in 
blocking-free reconstructed images. 

Comparisons of complexity and performance between the LiftLT and other 
popular transforms are tabulated in Table 1 and Table 2. The LiftLT*s performance is already 
very close to that of the optimal generalized lapped biorthogonal transform, while its complexity 
is the lowest amongst the transforms except for the DCT. 

To assess the new method in image coding, we compared images coded and 
decoded with four different transforms: 

DCT: (^-channel, 5-tap filters 

Type-I Fast LOT: 5-channel, /(J-tap filters 

LiflLT: ^-channel, /6-tap filters 

Wavelet: 9/7-tap biorthogonal. 
In this comparison, we use the same SPIHT's quantizer and entropy coder, A. Said and W. A. 
Pearlman, "A new fast and efficient image coder based on set partitioning in hierarchical trees," 
IEEE Trans on Circuits Syst Video Tech, vol. 6, pp. 243-250, June 1996, for every transform. 
In the block-transform cases, we use the modified zero-tree structure in T. D. Tran and T. Q. 
Nguyen, "A lapped transform embedded image coder," ISCAS, Monterey, May 1998, where each 
block of transform coefficients is treated analogously to a fiill wavelet tree and three more levels 
of decomposition are employed to decorrelate the DC subband further. 

Table 1 contains a comparison of the complexity of these four coding systems, 
comparing numbers of operations needed per 8 transform coefficients: 



Transform 


No. Multioiications 


No. Additions 


No. Shifts 


8x8 DCT 


13 


29 


0 


8x16TvDe-l Fast LOT 


22 


54 


0 


9/7 Wavelet. 1 -level 


36 


56 


0 


8x16 Fast LiftLT 


14 


51 


6 
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In such a comparison, the number of multiplication operations dominates the "cost" of the 
transform in terms of computing resources and time, and number of additions and number of shifts 
have negligible effect. In this table, it is clear that the fast LiftLT is almost as low as the DCT in 
complexity and more than twice as efficient as the wavelet transform. 

Table 2 sets forth a number of different performance measures for each of the four 
coding methods: 



Transform 


Coding Gain (dB) 


DC Atten. (-dB) 


Stopband Atten. (-dB) 


Mir. Freq. Atten. (-dB) 


8x8 DCT 


8.83 


310.62 


9.96 


322.1 


8x16 TvDe-l Fast LOT 


9.2 


309.04 


17.32 


314.7 







327T 


— 13.6 


5^:54 ■ — 


8x16 Fast LiftLT 


9.54 


312.66 


13.21 


304.85 



The fast LiftLT is comparable to the wavete^^ransform in coding gain and stopband attenuation -TT \ 

and significantly better than the nrX w mirrnr frpqnpnry att#>niiatinn (a figure nf m firit relateAl Q> 



- alia s ing ). 

Reconstructed images for a standard 512x512 "Barbara" test image at 1 : 32 
compression ratio are shown in Figure 9 for aesthetic and heuristic evaluation. Top left 21 is the 
reconstructed image for the 8 x 8 DCT (27.28 dB PSNR); top right shows the resuH for the 8 x 
16 LOT (28.71 dB PSNR); bottom left is the 9/7 tap wavelet reconstruction (27.58 dB PSNR); 
and bottom right, 8x16 LiftLT (28.93 dB PSNR). The objective coding results for standard 
512x512 "Lena," "Goldhill," and "Barbara" test image (PSNR in dB's) are tabulated in Table 3: 





Lena 


Go/tf/i/// 


Barbara 


Comp. 
Ratio 


9/7 WL 
SPIHT 


8X8 
DCT 


8X16 
LOT 


8X16 
LiftLT 


9/7 WL 
SPIHT 


8X8 
DCT 


8X16 
LOT 


8X16 
LiftLT 


9/7 WL 
SPIHT 


8X8 
DCT 


8X16 
LOT 


8X16 
LiftLT 


8 


40.41 


39.91 


40.02 


40.21 


36.55 


36.25 


36.56 


36.56 


36.41 


36.31 


37.22 


37.57 


16 


37.21 


36.38 


36.69 


37.11 


33.13 


32.76 


33.12 


33.22 


31.4 


31.11 


32.52 


32.82 


32 


34.11 


32.9 


33.49 


34 


30.56 


30.07 


30.52 


30.63 


27.58 


27.28 


28.71 


28.93 


64 


31.1 


29.67 


30.43 


30.9 


28.48 


27.93 


28.34 


28.54 


24,86 


24.58 


25.66 


25.93 


100 


29.35 


27.8 


28.59 


29.03 


27.38 


26.65 


27.08 


27.28 


23.76 


23.42 


24.32 


24.5 


128 


28.38 


26.91 


27.6 


28.12 


26.73 


26.01 


26.46 


26.7 


23.35 


22.68 


23.36 


23.47 



PSNR is an acronym for power signal to noise ratio and represents the logarithm of the ratio of 
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maximum amplitude squared to the mean square error of the reconstructed signal expressed in 
decibels (dB). 

The LiftLT outperforms its block transform relatives for all test images at all bit 
rates. Comparing to the wavelet transform, the LiftLT is quite competitive on smooth images - 
about 0.2 dB below on Lena. However, for more complex images such as Goldhill or Barbara, 
the LiftLT consistently surpasses the 9/7-tap wavelet. The PSNR improvement can reach as high 
as 1.5 dB. 

Figure 9 also shows pictorially the reconstruction performance in Barbara images 
at 1 :32 compression ratio for heuristic comparison. The visual quality of the LiftLT reconstructed 
image is noticeably superior. Blocking is completely avoided whereas ringing is reasonably 
contained. Top left: 8x8 OCT, 27.28 dB. Top right: 8x16 LOT, 28.71 dB. Bottom left: 9/7-tap 
wavelet, 27.58 dB. Bottom right: 8x16 LiftLT, 28.93 dB. Visual inspection indicates that the 
LiftLT coder gives at least as good performance as the wavelet coder. The appearance of 
blocking artifacts in the DCT reconstruction (upper left) is readily apparent. The LOT transform 
result (upper right) suffers visibly fi-om the same artifacts even though it is lapped. In addition, it 
is substantially more complex and therefore slower than the DCT transform. The wavelet 
transform reconstruction (lower left) shows no blocking and is of generally high quality for this 
level of compression. It is faster than the LOT but significantly slower than the DCT. Finally, the 
results of the LiftLT transform are shown at lower right. Again, it shows no blocking artifacts, 
and the picture quality is in general comparable to that of the wavelet transform reconstruction, 
while its speed is very close to that of the bare DCT. 



